Introduction

City pop vs U.S. Pop

In this portfolio, I will explore the differences and features of the two genres City Pop (CP) and U.S.Pop (UP), originating from Japan and the U.S. respectively. CP is a type of genre from Japan that appeared the late 70’s and reached its popularity peak in the 80’s. I will be comparing two playlists consisting of three artists, one Japanese group and one US playlist. The Japanese group consist of the artists Taeko Onuki, Miki Matsubara and Anri. The US counterparts are the artists Michael Jackson, Whitney Houston and Madonna. I chose these corpora because I want to explore whether there are distinct differences between the genre of (city) pop as it was in Japan in the 80’s vs the pop that was popular in the western world in the same decade. Japanese CP was influenced by western music, so I expect there to be many similarities in use of sound, instruments and type of rhythms. However, an aspect I am particularly interested whether there is a difference is the prevalence of bass, and tempo. It is also interesting to see whether there are differences in other aspects like timbre. However, I am unsure to what extent they are different, which will be explored here.

The reason I chose this topic is because of my personal tastes. Ever since I was a child, pop has been a significant part of my life and upbringing as this was perhaps the major genre both my parents listened to. Due to rising popularity on the internet, CP has gained a lot of traction and even has spawned new types of sub-genres, e.g., future funk, from what I think is a similarity from Western pop as well as many of the songs ability to sound modern in today’s standards.

As I have chosen three artists to represent their own (variety) of genres, there might be nuances and representations I am missing. Taeko Onuki, Miki Matsubara and Anri were chosen due to their popularity on Spotify (the amount of general listeners as well as listens to their tracks). I also have to mention that there were personal selections. Nevertheless, they were centered around three albums from each respective artist. The same method was done in choosing the western counterparts. However, the genre(s) is (are) very broad, despite its popularity, and some varieties might have been overlooked. However, their popularity is a strength as many causal listeners will have knowledge of these songs.

Typical, and popular, tracks from the Japanese playlist are:

These songs are typical in the sense that there are prominent use of basslines and clear rhythms, very stereotypical pop and have many timbre features to them, as well as many sound layers, e.g., instruments etc.

The western counterparts have typical tracks like:

These last three tracks especially has the typical and distinct features of pop of the 80’s, namely the sharp drums and the heavily synthesized piano sounds and, what I think, an almost like a “dreamy” sound to them.

Atypical songs from both playlists can include:

In order to explore eventual differences and features of this corpus, I will first start with a classification model with a random forest. Then I will explore track level features between the two genres, focusing on what the classification model labels my corpus. After I will go more into the musical moments such as timbre and chroma, focusing on self-similarity matrices and chromagrams and chordograms. Finally, I will go into depth about what this portfolio has explored and conclude what can be derived from this.

Classification

The classifier is better than expected at categorizing each genre


# A tibble: 2 × 3
  class    precision recall
  <fct>        <dbl>  <dbl>
1 City Pop     0.703  0.839
2 U.S. Pop     0.8    0.645

In order to compute the model, I did capped the playlists at 31 songs in each group. The accuracy of the model as of right now is by the use of the formula (TP+TN)/total according to this website is (24+24)/62 = 0.77.., thus according to various sources like this one it is a relatively good accuracy while at the same time being within a realistic interval.

By this, I can assume that the model is decent at classifying exactly what City Pop and what U.S Pop is. In the next header, I will see what kind of labels were the most important in the classification of these playlists, and determine what labels where most important in this classification.

From these labels the track level feature loudness, and timbre coefficients are the most important labels when classifying CP and UP.


# A tibble: 2 × 3
  class    precision recall
  <fct>        <dbl>  <dbl>
1 City Pop     0.828  0.774
2 U.S. Pop     0.788  0.839
          Truth
Prediction City Pop U.S. Pop
  City Pop       24        5
  U.S. Pop        7       26

I hope this is the right way of getting the confusion matrix out of the random forest model. If I use the same formula as in the previous slide, i will get an accuracy of 0.77…, just as the last one.

Regarding the feature selection, it is shown in the graph that the timbre coefficient c11, followed by the track level feature loudness, followed yet again by timbre coefficient c1 are the most important labels that helps the model classify the two genres. The importance of loudness and the timbre coefficient c1 is quite humorous as it was said in one of the lectures that this timbre vector is the rough equivalent to loudness. After those, primarily timbre features are the ones that are of the most importance. This is quite interesting, as I did mention at the very start of the course (without any knowledge of the terms of music) said that City Pop has a lot of layers to them. This was meant as “there is a lot going on” and it sounds different. This is probably why

For the final portfolio, I will use these features in order to improve the existing visualizations I have (and I have the ability to remove some of them). Yay!

Graphs

CP is louder than UP


Here one can see the temporal and power features between the two genres. As one can see, there is a definite preference for a higher volume in the City Pop group than the other gorup. This graph was made as a direct consequence of the labeling done in the previous section. As one can see, there is a distinct difference in loudness between the two genres, as it seems that while both of them tend to stay around the same beats per minute, i.e., 120-125. Which coincidentally corresponds well to the study by Moelants (2002), which says that humans seem to prefer this tempo.

However, one can tell that there are many items that do not correspond to this as there are plots between the 90-120 area in CP and from 90-150 in UP. One could perhaps say that UP have more items directly corresponding to 120 bpm.

Nevertheless, UP, in terms of loudness, stop at just above -9 dB. What is very clear is that CP does not stop at this as multiple items is above -6 dB. This, if I am interpreting it correctly, does mean that CP is more loud than UP.

Other track level features single out some outliers.


Here is a plot of the effect of energy on danceability, with size of the plots as well as the band around the line indicating the tempo of the songs. One can see that there is more of a linear trend with US Pop, indicating that up until around energy of 0.5 there seems to be a correlation between energy and danceability. Tempo, however, do seem to have no pattern at first glance. Whereas for the City Pop playlist, there seem to be a slight curve in the beginning of the graph, but overall there is a very even trend of the effect of energy on danceability. At first glance, there also seem to be no indication that there is a trend for tempo.

However, once can tell that U.S. Pop seem to have more of a positive linear correlation between energy and danceability - more than its eastern counterpart in any case.

In both groups, there are two outliers that especially draw one’s eye - which is “Billie Jean” by Michael Jackson, and “横顔” by Taeko Onuki. They both have, in comparison to other songs, very low energy while at the same time having high danceability.

Chromagrams of Outliers

Billie Jean - Michael Jackson

Commentary

“Billie Jean” was listed as an outlier for the Pop group in the graph of whether energy has an effect on danceability, in the sense that Spotify notices this as a non energetic song, while having an extremely high danceability. As one can see in this chromagram, the song “Billie Jean” has several areas where the magnitude is over 0.75. A couple of patterns that arise are the use of the D and C#/Db at before 100 and 200 seconds. They form an almost pyramid shape. However, despite this, one can see that it is a very energetic song, by the use of the chromas - despite being a very slow and unenergetic song.

Self-Similarity Matrices

Tempi

Histogram of tempi


Here is the distribution of tempi within my corpus, divided between the Japanese group and the U.S. group. As the graphs show, there are distinct differences in which tempo is preferred in the western world - which one can see is around 120 bpm. Interestingly, this corresponds well to the research that has been conducted where 120-125 bpm seems to be a natural tempo for humans and a link between the natural tempo and bpm in music has been found. However, according to the corpus I am using, this is not the case for the eastern world. The distribution here is more even, with more songs being in the range of 100-135 bpm. However, there are more songs that lie around the 105 bpm mark.

Aside from the obvious archetypes in this distribution, there are two outliers from each group. These are the songs “A HOPE FROM SAD STREET” by Anri, and “Love Is a Contact Sport” by Whitney Houston. They each have tempi that lie around 180 and 175 bpm respectively.

Billie Jean - our atypical song


Here is the tempogram of the indentified outlier from the graph of energy’s effect on danceability, where the energy is low, but danceability is high. This is a completely standard tempogram, in terms of the overall execution of the formulations on this piece as there are no disturbances or anything that went wrong. “Billie Jean” is a very stereotypical song in this regard, with a bpm of approximately 120 and very steady beat, rhythm and tempo that do not seem to change. This is probably a part of the reason the overall danceability of this piece is very high, in spite that its energy is very low.

A Hope From Sad Street - An Outlier in Distribution


Providing the distribution of tempi in the form of a histogram, there was a couple outliers in both genres. “A HOPE FROM SAD STREET” by Anri is one of them. Here, in comparison to “Billie Jean” there were apparantly some issues when generating the tempo, as there are multiple yellow lines that flicker across the entire piece. However, as one can tell, it was not a complete failure, as it was able to generate not one, but two tempo octaves. It is not as stable as the previous piece, but the lines are clearly there.

This flickering could be attributed to the way this song is realized, in that there are many layers of sound, there are the bassline, but there are also a layer of trumpet as well as a choir and points where the song is stopped and picked up again. The bassline often is replaced by piano, and guitar solos - before returning to the “status quo”. As far as I can tell, these instances are represented in the points in the tempogram.

Love Is a Contact Sport - Western Outlier in Distribution


In comparison to Anri’s song, “Love is a Contact Sport” by Whitney Houston does not have any issues at all, as clearly shown by the image. The song is also realized very clearly, with a clear tempo throughout the entire song. This song was also one of the outliers in the distribution chart, and was the western counterpart to Anri’s. Similarly to the aforementioned song, this has two tempo-octaves.

Interesting find - Cat’s Eye by Anri


Once again, here is a song by Anri, called “CAT’S EYE - (NEW TAKE)”. I wanted to include this, due to the algorithm being able to generate the tempo very clearly in the beginning but when the song has reached around 130 seconds, there is a clear window where it was not able to generate the tempo. This is very clear if you listen to the song, as there is a moment around that time where the bass guitar has a solo. To me, it feels quicker than the song itself, which it has been able to show in some regard - however, this shift was clearly something the tempogram could not handle and was interesting to me.

Keys & Chords

Histogram of keys


As one can see from the graph here, the distribution of keys between each song in each of my corpus group are, overall, quite evenly distributed. However, one can tell that different groups often prefer different keys. City Pop seems to prefer keys that are in C, D, F, and G for the most part, whereas U.S. Pop, while sharing a decent amount of songs in C as well, seem to also prefer D#, A, and B - unlike City Pop. It should be noted however, that there are a few songs more in the city pop group than the U.S. one, so it can skew the distributions.

Mayonaka no Door/Stay With Me City Pop

4:00A.M. City Pop

Remember Summer Days City Pop

I Wanna Dance With Somebody (Who Loves Me) Pop

Material Girl Pop

Thriller Pop

Conclusions/Summary UPDATED

Differences in City Pop vs US Pop HAS NOT CHANGED

So far we can see that the differences between US Pop and City Pop, based on the corpus I am using, is that energy definitely seem to have an effect on danceability in US pop, up until a certain level. However, there are many outliers that can skew this, which will be identified shortly. Furthermore, it does not seem that tempo has a definite pattern on either energy or danceability.

Furthermore, another difference is that danceability seem to have more of an effect on the positive valence in both genres. However, one can see that there are more songs in US pop that are more danceable and that are happier than City Pop.

This makes sense, given the fact that according to the histogram, none of the city pop songs go above a certain threshold of valence, in comparison to US pop. This could indicate that city pop is generally less “happy”.

But one outlier I want to talk about in particular is the one track, in the energy, danceability and tempo plot, where the energy is quite low in comparison to other tracks, but the danceability is one of the highest in the group. While I am not sure how to point this out in the plot, I have managed to identify it as the track “Billie Jean” by Michael Jackson. A seperate chromagram has been made in order to account for this.